[HIPDNN] Tolerance lookup followup by bghimireamd · Pull Request #8752 · ROCm/rocm-libraries

bghimireamd · 2026-06-24T05:00:11Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…oading Bundle discovery scans integration_test_bundles/ for .meta.json files, derives GTest names from folder structure, and eagerly loads graph + tensor data. IntegrationTestBundle models bundles as facets with optional golden outputs. BundleRegistration wires discovered bundles into GTest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…model Verification modes (auto|golden|gpu|cpu) control what engine output is compared against. The harness extracts golden outputs before execution, runs references via ReferenceCapabilityError-aware adapters, and reports unverifiable bundles. Includes CLI flags, env fallbacks, and unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Graph-only bundles without input data get their inputs synthesized per-op via free functions dispatched by node attribute type. RoleAccounting enforces that every owned leaf input is accounted for (FREE/STRUCTURED/DERIVED) — unaccounted inputs refuse the bundle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Rename FillOutcome → SynthesisResult and RoleAccounting → SynthesisTracker for clarity: SynthesisTracker tracks per-node input role declarations, SynthesisResult reports whether synthesis succeeded. - Rename RoleAccounting.hpp → SynthesisTracker.hpp to match the class name. - Expand input synthesis from 3 ops to all 19 supported attribute types (conv fwd/bwd/wrw, batchnorm inference/variance/training/backward, matmul, pointwise, reduction, layernorm fwd/bwd, rmsnorm fwd/bwd, resample, block-scale dequantize/quantize, sdpa fwd/bwd). - Remove single-node graph restriction — harness now loops over all nodes in a graph, supporting fused graphs (e.g. conv+bias+relu). - Improve comments throughout for reviewer clarity: explain what "owns" means, document the 4-step fill pattern, add per-function comments explaining why specific inputs are STRUCTURED or DERIVED. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move all method implementations out of the 831-line header-only harness into a .cpp file (~700 lines), leaving the header as declarations only (~210 lines). The harness is an internal test executable with virtual methods — no reason for it to be header-only. Add 9 unit tests for SynthesisTracker covering: all-FREE success, undeclared inputs, STRUCTURED/DERIVED refusals, uid-0 skip, non-owned uid skip, empty owned set, mixed failures, and factory methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Make the harness's mode dispatch testable by reading the verification mode through a protected virtual rather than directly from the TestConfig singleton. Add 10 unit tests covering every branch: auto mode with and without golden data, GPU→CPU fallback chain, golden/gpu/cpu explicit modes, and capability-miss skip paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…clang-tidy - Track gpuRefErrored in runAutoMode() so the skip message distinguishes GPU-ref-errored from GPU-ref-capability-miss when CPU ref also misses. - Extract resolveVerificationMode() and resolveGoldenDataDir() as free functions (CLI > env > nullopt precedence) and add 6 unit tests. - Fix readability-implicit-bool-conversion in SynthesizeInputs.hpp: change all if(!ptr) to if(ptr == nullptr). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously each fill function created its own SynthesisTracker with the whole-graph leaf input UIDs, so finish() rejected any UID not declared by that single node. For multi-node/fused graphs (e.g. conv+relu), the second node's finish() saw the first node's UIDs as "no role declared" and returned unsupported → SKIP. Fix: create one SynthesisTracker in synthesizeInputs(), pass it through all fill functions, call finish() once after all nodes have declared their UIDs. Each node now only accounts for its own inputs, and the final finish() verifies that the union covers all leaf inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-optional, runEngineOrSkip - Fill output tensors with NaN sentinel instead of zero so allClose's NaN guard catches unwritten elements (previously masked as computed zero) - Mark FP8 descale/scale factors as STRUCTURED (refuse to fabricate random values that cause identical saturation → vacuous pass), matching BlockScaleDequantize precedent - Make metadata mandatory only when golden outputs are present; graph-only bundles default-construct empty metadata instead of failing with MISSING_METADATA - Extract runEngineOrSkip() helper to deduplicate engine-run preamble across all three verification modes - Thread engine error message into skip reason (was silently discarded) - Make applyMetadataGuards() virtual so unit tests can override the hardware-guard path without initializing TestConfig singleton - Fix test name lint failures (suite/case naming convention) - Add SynthesisTracker precondition documentation - Add synthesizeInputs() phase documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ref path - Remove uid!=0 guard from SynthesisTracker::isOwned() — uid=0 is a valid tensor uid when the graph declares it as a real leaf input. The "absent optional" convention (uid 0 = not connected) is the fill function's responsibility, not the tracker's. - Guard ref-path device allocation on _requiresDevice so CI runners without a GPU don't trigger std::bad_alloc from rawDeviceData(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ghimire/ALMIOPEN-1970/golden_reference_fallback

deriveTestName() threw when a graph .json sat at the data root with no folder above it. That blocked the natural "--gd .../graph_only_bundle" invocation, where the user points the data root straight at a single bundle's folder. Use the root folder's own name as the suite name in that case instead of throwing, so the bundle is discovered as {folder}.{stem}. Update the unit test (JsonAtRootThrows -> JsonAtRootUsesFolderNameAsSuite) to assert the new behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extract shared TomlGuards.hpp and wire all three harnesses through it. Fix toleranceForNodeAttributes missing SdpaAttributes/SdpaBackwardAttributes. Add SynthesisTracker::fillComputed + tensorAt for follow-up init unification. Rename appendTensorDiff/appendFpDiff to writeTensorDiffReport/writeFpDiffReport. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-init unification branch) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ards calls Both harnesses now use the same shared free functions (skipIfTomlMatched, applyTomlToleranceOverride) from TomlGuards.hpp. The TOML lookup logic is already unit-tested in TestTestSettings.cpp. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Harness unification (ALMIOPEN-1969 follow-up): - Share input synthesis between both harnesses: move SynthesisTracker and SynthesizeInputs to harness/input_init/ (namespace hipdnn_integration_tests). The non-golden harness now drives its initializeBundle() through the same synthesis switch, with a random [-1,1] fallback. Remove 4 of 7 non-golden initializeBundle overrides whose ranges/seeds now match the shared fill functions; 3 remain (fused-graph range conflicts and a large-values stress test). - Rename harness/golden/ -> harness/bundle/ and namespace hipdnn_integration_tests::golden -> ::bundle. "golden" was inaccurate: the harness verifies bundles via golden data OR a GPU/CPU reference, and most bundles ship no golden output. Rename the class IntegrationGraphGoldenReferenceVerificationHarness -> IntegrationBundleVerificationHarness (parallel to its sibling IntegrationGraphVerificationHarness). Fix stale "golden" log/report strings that mislabeled all bundles or shared comparison paths. The golden verification *mode* and golden output *data* vocabulary, and the --golden-data-dir CLI flag, are retained (accurate / external). - Drop redundant hipInit/hipGetDevice/_deviceId from the graph harness: HIP initializes lazily and getSharedHandle()->hipdnnCreate() already does it before any graph runs; _deviceId was write-only. - Split bundle metadata guards: the VRAM check applies to every bundle the engine runs, but the GPU-arch lock only matters when comparing against golden output values (arch-specific). Gate the arch check on hasGoldenOutputs so an inputs-only bundle verified against a local reference is not wrongly skipped on a different arch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Both harnesses resolved comparison tolerance with duplicated, divergent code. Replace both with one shared resolver (harness/tolerance/ToleranceResolver.hpp) where the aggregation policy is a plain function (GraphWrapper, dtype) -> float, selectable per caller. What changed: - New resolveTolerance(wrapper, dtype, testName, atol, rtol, policy=max): derives a default via the chosen aggregation policy, then applies the TOML per-test override (highest priority) in ONE place. Single tolerance entry point for both harnesses; the override is no longer applied separately in either, so the layering order lives here alone. - Two policies, each a free function (no enum/switch; the policy IS the function, C++17 function pointer): * maxAcrossNodes — loosest per-node tolerance; conservative envelope, never tighter than any node so it cannot cause a false failure. * outputOpTolerance — tolerance of the last non-Pointwise op (the output-producing op); reproduces the graph harness's historical getTolerance() behavior. - Wiring preserves existing behavior on both sides: * Graph harness getTolerance() -> outputOpTolerance (its prior policy), now expressed through the shared resolver instead of a private dynamic_cast switch. It also serializes via to_binary() and reads the output dtype from the flatbuffer, so it shares the resolver keyed on the same representation the bundle harness uses. * Bundle harness -> maxAcrossNodes (its prior policy), unchanged. The two policies agree for the common one-real-op + activation case (activation is Pointwise -> skipped; the single real op is both loosest and last), so they differ only on multi-real-op fusions, which carry explicit overrides today. - Deletes the duplicate per-op tolerance switch that existed in each harness (one on dynamic_cast of live nodes, one on the flatbuffer NodeAttributes enum); both now share one flatbuffer-keyed lookup. - warnIfMultipleOutputs() guards the single-output assumption: every current policy reduces over the whole graph, not the per-output subgraph, so multi-output graphs are flagged loudly (deferred fix, ALMIOPEN-2216). Why output-op for the graph harness (not max): a mechanical migration of the C++ graph tests should not change their tolerances. outputOpTolerance keeps the exact prior numbers. Note this is a heuristic, not a principled tight bound (it attributes the whole output tolerance to one op); it is kept for migration parity. The principled tighten path is the future DynamicTolerances upgrade (see TODO in ToleranceResolver.hpp and ALMIOPEN-2216); FP4 also needs sub-bf16 entries in the dtype switch (currently falls through to 1e-3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bghimireamd and others added 19 commits June 22, 2026 13:25

Fix clang-tidy misc-const-correctness in TestSynthesisTracker

dec13b1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Apply clang-format to CI-flagged files

b067310

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'develop' of github.com:ROCm/rocm-libraries into users/b…

d2860b4

…ghimire/ALMIOPEN-1970/golden_reference_fallback

Remove fillComputed/tensorAt from SynthesisTracker (deferred to input…

75bf232

…-init unification branch) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

bghimireamd changed the base branch from develop to users/bghimire/ALMIOPEN-1969/tolerance_lookup June 24, 2026 05:00

github-actions Bot added the project: hipdnn-integration-tests label Jun 24, 2026

assistant-librarian Bot added the organization: ROCm label Jun 24, 2026

bghimireamd force-pushed the users/bghimire/ALMIOPEN-1969/tolerance_lookup branch 4 times, most recently from 67c0a2b to e8ff12f Compare June 26, 2026 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[HIPDNN] Tolerance lookup followup#8752

[HIPDNN] Tolerance lookup followup#8752
bghimireamd wants to merge 19 commits into
users/bghimire/ALMIOPEN-1969/tolerance_lookupfrom
users/bghimire/ALMIOPEN-1969/tolerance_lookup_followup

bghimireamd commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

bghimireamd commented Jun 24, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant